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Overview Mellon 
University 
= Risk management frameworks 
e Which human is a baseline driver? 
e Risk mitigation is not safety 
= Uncertainty as a limiting factor 
e Predicting safety before deployment 
e Field feedback to manage uncertainty 
= A broader view of Safe Enough 


e Ethical considerations ADS = Automated 
e Hierarchical model of safety needs Driving System 


a Deployment criteria (Car drives; people can sleep) 
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ADS Technology: 


Sold Based on Safety 


Waymo VSSA hittps://bit.ly/2QuYhai 


We're Building 
a Safer Driver 
for Everyone 


Self-driving vehicles hold the promise 

to improve road safety and offer new 
mobility options to millions of people. 
Whether they're Saving lives or helping 
people run errands, commuteto work, 
or drop kids off at school, fully self- 
driving vehicles hold enormous potential 
to transform people’s livésforthe better. 


Safety is at the core of Waymo's 
mission—it’s why we were founded 
over a decade ago as the Google 
Self-Driving Car Project. 


Ford VSSA https: /[bit. ly/3njionT 
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University 


Newsworthy crashes might not predict safety 


SELF-DRIVING CARS — 


e Crewed testing is not autonomous How terrible software design 
e Crash reports need a denominator decisions led to Uber's deadly 


; 2018 crash 
N eed a fra m ewo r k fo r eva | U atl n g NTSB says the system "did not include consideration for jaywalking 
safety beyond the news cycle etree 


BE. © : 
mee 
ents 
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8 Carnegie 
Ethics: The Blame Game vee 
= Companies blame human drivers for bad news 


e Humans are terrible at supervising 
automation 


e Maybe driver monitoring helps(?) 

= The Moral Crumple Zone: 

e Blame the most convenient human 
for failing to mitigate technical 
malfunctions 

= Regulatory strategy: computer is driver ‘ees 


e Not a legal person, so ... 
crashes are nobody's fault (7???) 
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How About A Robot Driver Test a 
= Written test 


e Does ADS know traffic laws & behaviors? 
= Road test 

e Can ADS obey traffic laws? 

e Can ADS negotiate effectively with human drivers? 

e Can ADS resolve potentially ambiguous situations? 
= Being a 16 year old human 

e How do we measure ADS judgment maturity? 

e Autonomous systems struggle with novelty, unknowns 
=» Need safety engineering, not just a driver test 
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Setting The Risk Goal Nii 

= MEM — Minimum Endogenous Mortality 

e System risk has minimal effect on overall risk 
m ALARP - As Low As Reasonably Practicable == ==== =e" 

e Reduce identified risks unless cost is extreme |} (2) 
m= NMAU - “Nicht Mehr Als Unvermeidbar’” 

e Reduce identified risks within reasonable cost 
= SIL — Safety Integrity Level approaches 

e Engineering rigor applied to mitigate risks = 
= GAMAB - “Globalement Au Moins Aussi Bon” 

e At least as good as an existing system (e.g., a human driver) 
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Positive Risk Balance (PRB) ve 
= Utilitarian GAMAB approach 
e 36,096 fatalities (1.10/100M miles) OMT 


TRAFFIC SAFETY FACTS “= 


e 2,/40,000 injuries 
e 6,/56,000 police-reported crashes i. ise1306oporus813.021 sia, | 
e Data includes drunk drivers, speeders, no seat belts 

=» Expect zero deaths in a 10M mile testing campaign 


= The averages do not necessarily apply 
e Which driver? 
e Under what conditions? 
e Driving which vehicle? 
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i La i eS Na 


University 


= ~100M miles/fatal mishap for human drivers 
e 28% Alcohol impaired/Driving Under Influence 
e 26% Speed-related 
e 9% distracted driving 
e 2% drowsy eco [DOT HS 813 060 & DOT HS 813 021] 

(total > 100% due to multiple factors in some mishaps) 


= Fully functional drivers are much safer 


= New AV has better safety than x 
10+ year old “average” car LY 


= Better than an unimpaired, undistracted driver in new car 
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_ Driver Age Affects Crash Rates alee 


= Are older drivers worse? (caution — not the whole story!) 


Police-Reported Crashes per 100M VMT 
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Driver Age Affects Crash Rates 


=» Better than a middle-aged driver 
Police-Reported Crashes per 100M VMT 
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Region Affects “Safe Enough” Value is. 
= Fatality averages for 2019 (IIHS) 
e Location Deaths/100K people Deaths/100M miles 
e DC on0 MA 0.51 
e US 11.0 Lin US 1.11 Vo 
e WY 25.4 SC 1.73 
w Fatal crash type [IIHS Fatality Fact Sheets State by State; DOT HS 813 060] 
e DC: highest pedestrian rate (39%) 
e NY, FL, DE: highest bicycle rate (5%) 
e Fatalities per 100M miles: Urban 0.86 vs. Rural 1.65 
e What about day/night, weather, etc.? 
=» Better in same conditions as AV operations 
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Carnegie 
When Do We Deploy? — 


University 


m= Assume we determined a human driver 
baseline for comparison 
e Competent, unimpaired middle-age driver 
e Same operational conditions as AV 
(location, time of day, weather, ...) 
= RAND report says only 10% better than he oom Gaad 
human driver is a safety win Esfimating the Cost of Waiting for 


F : Nearly Perfect Automated Vehicles 
e But, this assumes accurate estimate of 
safety is available before deployment 


e What if estimate is 5x too optimistic? we 
= Need to address uncertainty RR2150 
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Nidhi Kalra and David G. Groves 


e e ® _ Carnegie 
Validation Via Brute Force Road Testing?  t*!",,, 


F —- ° Ipna sas. 
= If 200M miles/critical mishap... 9 _—— "4h 
e Test 3x—10x longer than mishap rate 
=> Need 2 Billion miles of testing Ts 


4.03 million mi 


= That's ~50 round trips = 
on every road in the world aang 
e With fewer than 10 critical mishaps — “Qo a 
e Even more testing if you find a , - Gore =< 
defect and redo some testing ~~ yn 
= Road testing leaves uncertainty a 
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Highly scalable; fidelity vs. cost tradeoff 

e Need to build highly detailed models (modeling errors?) 

e Challenge of matching real world data into simulation models 
e Only tests things you have thought of 
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How Much Do You Trust Validation? ree 


= Would you put a child in front of an AV validated with: 
e 10,000M mile sims 
.. perhaps with a simulator error? 


e Based on 100M miles road data collected 
... perhaps with scenario analysis errors? 


e Validated by 10M miles of road testing 
... that missed the above errors? 


e And 10K repetitions of closed course testing ¥ 
.. with standard dummies instead of people @ 


e Built with biased perception training data? 


e Using software binaries & tools 
... with no safety qualification? © 2022 Philip Koopman 16 


e e ° Carnegie 
Engineering Rigor Meee 

= Testing alone is insufficient for life-critical systems 
e So we use also use engineering rigor 


= Can you trust the system itself? 
e Is it engineered for safety? 
e Were standards and best practices used? 
e Is there a safety case documenting all this? 


= Can you trust your validation process? 
e Did you engineer the simulations properly? 
e Did you design the validation campaign properly? 
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® @ ene @ Carnegie 
Identifying & Mitigating Hazards tires 
= ISO 26262: Hazard and Risk Analysis (HARA) 


e Identify and mitigate risks per ASIL requirements 
4 1 


== Known unsafe scenarios (Area 2) 


m ISO 21448: / 
Identify and mitigate |, yyy, I Unirownunsate scenarios (aren 3 
unsafe scenarios ay ee 
e Safety of the Intended | oo 

Function (SOTIF) . —. 
e Reduce “unknown ae pe 


CUED PEDEEEEDS Unknown sone eee 


YZ _ Known safe scenarios (Area 1) 


Unsafe Safe 


Known 


unsafe” area CS .. _ 
SEEDED EDD ro MINN 2 EE 


e Deploy at acceptable 2 ~~ Gepe™. 3 UNKNOWN 
i : pe= UNSAFE AREA 
residual risk P 
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Field Engineering Feedback eb 


University 
= Expected risk has a mean + uncertainty 


e Deploy only when mean is acceptable 
e But there will be uncertainty 


> 
Zz 
— Missed edge cases during road testing a eae es 
— Unknown gaps in validation plan me : ye PETER 
— Unknown unknowns in general ae (on Go PRB THRESHOLD 
> 
[— 
ion: j Lu ACTUAL SAFETY 
= Solution: manage uncertainty | wiht BE HERE! 
e Safety Performance Indicators (SPIs) ce 


— SPI violation means safety argument has a defect (surprise!) 


e “Surprise” arrival rates could help estimate safety case uncertainty 


—- Start during validation; continue after deployment epi ence ro 


J Carnegie 

ANSI/UL 4600 SPIs and Lifecycle Feedback =‘, 
= SPI: direct measurement of safety case claim failure 
e Independent of reasoning (“claim is X ... yet here is ~X”) 


= A falsified safety case claim: 
e Safety case has some defect 
e Not (necessarily) imminent loss event 


Is Claim 
False? 


SPI: 
{Metric, Threshold} 
(=) 


EVIDENCE 1 


= Root cause analysis might reveal: 
e Product or process defect 
e Invalid safety argument 
e Issue with supporting evidence 
e Assumption error 


ARGUMENT 1 


Sub-CLAIM 1A Sub-CLAIM 1B 
(=) (=) 
Sub-ARGUMENT 1A 7 
(=) 
EVIDENCE 1A 
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Field Engineering Feedback ee 
= Architectural support for lifecycle field feedback 
e Safety Performance Indicators (SPI) data linked to safety case 
— Transition from recall model to continuous improvement 


SOTIF 
HAZARD RUN-TIME 
ANALYSIS TRIGGERING EVENTS SAFETY 


MONITOR 


SPI * 
Data Recalls 


pure 


SPI 
Data 


SPI 
Data 
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Ethics: Risk vs. Safety S ie 
= Cost of excessive risk drives improvement 


A. 
e Reducing risk tends to improve safety, but... Oy. 


= Affordable risk might exceed acceptable safety 
e Life insurance for combat military personnel R S K 
e Commercial space launch insurance 
e Cost of fatality settlement compared to $2M-S$5M/day burn rate 


= Risk management is not enough for acceptable safety 
e Risk transfer (occupants vs. pedestrians) 
e Existential pressure for company to deploy with unproven safety 
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Ethics: Deployment Governance ‘Mellen 


University 


= #1 ethical issue in AVs is deployment governance 
e Who decides when to deploy based on what? 


= Pressure for aggressive deployments K im 
e Missing independent technical oversight TH RI esas, 


= Ethical deployment should address: CALC I AED, 
e Publicly disclosed safety prediction BUI HNN . 
e Inclusion of stakeholder concerns _ I 
© pes PaLeUCy of data & processes , al - 
e Accountability for any losses . “\ _ 
e Non-discrimination in operational concept A 5 
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What People Mean By “Safe” Mion 

= Human drivers are bad, so computers will be safe 

e Industry rhetorical talking points are ubiquitous 
= “Safety is our #1 priority” 
= Safe driving behavior 

e Follows traffic laws; good roadmanship 
= Tested/simulated for millions of miles 
= Risk is managed via insurance 
= Conforms to safety standards 
= Positive Risk Balance 
m Safety cases supported by evidence 


© 2022 Philip Koopman 24 


Hierarchy of Concurrent Safety Needs _ itis, 


_ University 


AV SAFETY 


HIERARCHY ce 
OF NEEDS Lifecycle-oriented safety culture 


SOCIO- 
TECHNICAL Stakeholder expectations 


eS SYSTEM SAFETY ANSI/UL 4600 — safety case 
= ISO 21448 — insufficiencies 
a 


/ ISO 26262 — internal faults 


HAZARD ANALYSIS Engineering risk mitigation 
DEFENSIVE DRIVING AV avoids driving risk 


BASIC DRIVING FUNCTIONALITY Can the AV drive? 
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Carnegie 
Summary: Safe Enough AV Deployment — «,, 


= Don't forget safety while public road testing - SAE J3018 
= Acceptable safety is more than just a risk number 

e Good human PRB + safety factor for unknowns 

e Safety & security industry engineering standards | 

e Ethical & stakeholder concerns addressed 
= Safety case 

e Transparent argument based on evidence 

e Lifecycle uncertainty management via feedback 
= Deployment Governance — #1 ethical issue 

e Stakeholders involved in safety criteria & decision 

e Safety culture assures fair dealing on decision © 2022 Philip Koopman 26 


